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Abstract: Large surveys of the local Universe have shown that galaxies with different intrinsic prop- 
erties, such as colour, luminosity and morphological type display a range of clustering amplitudes. 
Galaxies are therefore not faithful tracers of the underlying matter distribution. This modulation of 
galajcy clustering, called bias, contains information about the physics behind galaxy formation. It is 
also a systematic to be overcome before the large-scale structure of the Universe can be used as a 
cosmological probe. Two types of approaches have been developed to model the clustering of galax- 
ies. The first class is empirical and filters or weights the distribution of dark matter to reproduce the 
measured clustering. In the second approach an attempt is made to model the physics which governs 
fate of baryons in order to predict the number of galaxies in dark matter haloes. I will review the 
development of both approaches and summarize what we have learnt about galaxy bias. 
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1 Introduction 

It has long been known that the distribution of galax- 
ies on the sky is clumpy rather than random. Huge 
surveys of galaxies in the local Universe have further 
revealed that different types of galaxies are clustered 
in different ways. If galaxies are grouped into sam- 
ples according to intrinsic properties such as their lu- 
minosity, colour or morphology, then the measured 
clustering varies depending on t he characteristics of 
the galaxies under consideratio n l|Norberg et al ] |200ll . 
|2m)2; Zchav i et af] 12001 120T1I '). Fig. [T] shows this for 
galaxies from the two-degree field galaxy redshift sur- 
vey which have been divided into two classes according 
to their spectral type. Galaxies with "early" or "pas- 
sive" spectral types trace out a different pattern of 
large scale structure than the galaxies with "late" or 
"active" types. The early types delineate tighter fila- 
ments and the cores of clusters, whereas the late types 
sample the outer parts of these structures and appear 
more diffuse. 

Such differences are driven by the variation in the 
processes which shape the formation and evolution of 
galaxies with environment and halo mass. The fact 
that the clustering patterns of different kinds of galax- 
ies look different implies that measurements of galaxy 
clustering have the potential to tell us something use- 
ful about the nature and strength of these processes. 
To realize this, we need theoreticals models which can 
describe the large-scale structure in the galaxy distri- 
bution and connect this to the underlying physics. 

The large-scale structure of the galaxy distribu- 
tion is also used to constrain the values the basic cos- 
mological parameters, including the equation of state 
of the dark energy. The distortion of the clustering 
signal due to the gravitationally induced peculiar mo- 
tions of galaxies provides a measurement of the rate at 
which structure is growing, which in turn depends on 



the cosmic expansion history (|Guzzo et al.ll2008l : rWanel 
[2OO8I . The apparent location of baryonic acoustic os- 
cillation (BAG) features in the power spectrum or cor- 
relation function provides a geometrical test, measur- 
ing the redshift-distance relation ([Perc ival ct al. 200'^; 
ICabre fc Gaztafiagall2009l : ISanchez'eral.ii200a . ,2012). 
The power of large-scale structure probes depends on 
how well we can model galaxy bias. For example, in 
BAG studies, the measured power spectrum is often 
divided by a featureless reference spectrum to remove 
the overall shape of the spectrum from the analysis. 
However, this shape contains further cosmological in- 
formation if we can predict the form of the galaxy bias, 
so that we can infer the shape of the matter power 
spectrum. Galaxy bias is therefore a "nuisance" pa- 
rameter or systematic in large-scale structure probes. 
If we can model bias, we can enhance the scientific per- 
formance of wide-field galaxy surveys by marginalizing 
over this parameter. 

In this article I will first review empirical approaches 
to modelling galaxy clustering, explaining how these 
developed as the quality of N-body simulations of hi- 
erarchical clustering of the dark matter improved. In 
the second half I will discuss physical approaches to 
predicting galaxy bias and give an overview of what 
such models have told us. 



2 Empirical models of galaxy 
clustering 

The central pillar of the paradigm for the large-scale 
structure of the universe is gravitational instability. 
Small perturbations in the matter density seeded dur- 
ing inflation are amplified by gravitational instability. 
The early stages of t his process can be followed using 
perturbation theory (|Bernardeau et al.i.2002 ). Unless 
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Figure 1: The distribution of galaxies with early (red points) and late spectral (blue points) types in a 
volume limited sample (just faintwards of L*), drawn from the two-degree field galaxy redsh ift survey. The 
early and late type galaxies trace out different features of the cosmic web. Adapted from iNorberg et all 
(I2002D . 



specialized assumptions are made, the latter, nonlin- 
ear stages of structure formation ca n only be m odelled 
through numerical simulation (Dav is et al.lll98 5'). 

N-body simulations of the hierarchical growth of 
perturbations in the density of the Universe have played 
a central role in shap ing the current cosmological model 
(jSpringel et al.| [ 20 06'l . According to these calculations, 
the correlation function of the dark matter at the present 
day cannot be described by a simple power law. The 
correlation function of the mass today in a cold dark 
matter universe with a cosmological constant is plotted 
in Fig. (2] The correlation function of galaxies in a flux 
limited survey, roughly the clustering of L, galaxies, is 
also shown for contrast ( Baugh 1996). In this case, the 
correlation function is impressively close to a power- 
law over more than three decades in pair separation. 
The effective galaxy bias, defined as the square root 
of the ratio of the galaxy and dark matter correlation 
functions, is therefore scale dependent. 

Early N-body simulations lacked the resolution to 
reveal any irregularities in the structure of dark mat- 
ter haloes. Large volume simulations suitable for fol- 
lowing fluctuations on scales of tens of megaparsec 
were only able to resolve halos of group and cluster 
mass. Motivated by analytic calculations which ex- 
plained the large correlation lengths of galaxy groups 
through the clustering of high peaks in a Gaussian den- 
sity field (Kaiser 1984), the first attempts to model 
the spatial distribution of galaxies use d the smoothed 
density field of the dark matter (jPavis et al.l Il985l : 



IWhite et aT]|l987l 'l. ICole et all l|l997l ) assumed that 
the probability of finding a galaxy was some empir- 
ical function of the smoothed density field, with pa- 
rameters tuned to reproduce the galaxy correlation 
function. This approach has continued to be devel- 
oped, with_thfi_jntro(h^^ of the idea of stochastic 
bias (|Dekel fc Lahavl Il999h in which the overdensity 
in the galaxy distribution can be written as a non- 
linear function of the overdensity in the matter dis- 
tribution with a scatter. This framework has been 
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As codes became more efficient at calculating the 
gravitational forces between large numbers of particles 
and the processing speed of computers increased, it 
became possible to resolve haloes approaching galac- 
tic masses. The clustering of haloes depends, in the 
first approximation, on halo mass, with cluster-mass 
haloes being much more strongly clus tered than ha- 
los which might host th e Milky Way (|Cole fc Kaised 
ll989l : lMo fc Whitdll996l ). This led to models in which 
the form of the measured galaxy clustering could be 
obtained by applying a suitable weighting to halos, 
which varies with halo mass ( Jing et al. 199S). This is 
the forerunner of today's halo occupation distribution 
models in which the weighting is expressed in terms 
of the mean number of galaxies per halo, as described 
later. 

With further improvements to the simulations, it 
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Figure 2: The clustering in the matter distribu- 
tion, as quantified through the two-point correla- 
tion function. The lines show measurements from 
N-body simulations of a ACDM cosmology at dif- 
ferent epochs, with the upper-most curve corre- 
sponding to the present day. The points show a 
measurement of the galaxy correlation function, 
which unlike the dark matter, is well described by a 
power-law in pair separation. The effective galaxy 
bias, the square root of the ratio of the galaxy and 
matter correlation functions, is shown in the lower 
pane l and is scale depen dent. Based on a figure 
from I Jenkins et all (|l998h . 
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Figure 3: An attempt to reproduce the observed 
clustering of galaxies by associating galaxies with 
subhaloes with effective circular velocities above 
some threshold value (dashed, dot-dashed and 
solid) . The clustering of subhaloes is different from 
that of the overall dark matter (shown by the dot- 
ted line), and by tuning the circular velocity which 
defines the sample, a good match can be obtained 
with the observed galaxy c lustering (shown b y the 
points). Reproduced from lColm et al.l ( 1999f ). 



became possible to resolve structure inside dar k matter 
haloes (|Klvpin et al.lll999l : iMoore et al.lll999l '). Haloes 
form through mergers and the accretion of mass. With 
sufficient resolution, the central regions of the accreted 
haloes can be preserved for many orbits, whilst the 
outer parts are stripped off. This prompted a new 
generation of modelling in which resolved subhaloes 
were associated with galaxies. In an early example 
of what to day would be calle d "sub halo abundance 
matching" , IColfn et al.l (|l999h were able to match the 
observed power law clustering of galaxies by selecting 
all subhalos above some threshold circular velocity (see 
Fig.©. 

So how can the power-law galaxy correlation func- 
tion be understood, gi ven the shape of the d ark matter 
correlation function? iBenson et all (l2000l l described 
the predictions of their galaxy formation model in these 
terms, and argued that a power-law could be obtained 
for the galaxy correlation function if the "right" num- 
ber of galaxy pairs were predicted in each halo. Models 
which were set up to reproduce the galaxy luminos- 
ity function were found to predict a power-law galaxy 



correlation function in a ACDM cosmology. Fig. [4] 
shows the components of the galaxy correlation func- 
tion. The clustering of the halos occupied by galaxies 
is shown by the heavy solid line. Each halo has unit 
weight in this example. The curve turns over at small 
pair separations due to an exclusion effect; if halos got 
any closer to one another, they would be identified 
as a more massive halo by a percolation group finder. 
Considering only the dark matter particles contained 
within occupied dark matter haloes (long-dashed line) 
overpredicts the small scale clustering. The number of 
galaxy pairs within a halo does not increase with halo 
mass in proportion to the number of particles, so a 
lower clustering amplitude is predicted on small scales 
(light solid line). 

Today, these approaches have crystalized into two 
schemes: halo occupation distribution (HOD) mod- 
elling and sub-halo abundance matching (SHAM). 

HOD modelling has it s roots in the clump model 
of iNevman fc Scotti l|l952t) . In its modern form, HOD 
modelling took off around the start of the millennium, 
spurred on by the physical modelling described in the 
second part of this article. The HOD is a parametriza- 
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Figure 4: Reproducing the clustering of galax- 
ies in ACDM. The correlation function of halos 
which contain galaxies is shown by the heavy solid 
line. This curve turns over below r ~ 0.5/i~^Mpc 
due to an exclusion effect which prevents halos 
overlapping. The correlation function of the dark 
matter particles in these haloes is shown by the 
long-dashed line; this puts too many pairs in mas- 
sive haloes and leads to an overprediction of the 
small scale clustering. The number of galaxies pre- 
dicted by a galaxy formation model set up to re- 
produce the luminosity function gives a reduced 
number of pairs by comparison with the parti- 
cle case, and is in excellent agreement with the 
observed galaxy clu stering. Based on a figure in 
iBenson et af] ()20Q0[) . 



tion of the mean number of galaxies per halo. The 
HOD is split into contr ibutions from cent ral galax- 
ies and satellite galaxies IZheng et al.l 1120051 ) . Central 
galaxies are typically modelled using a softened step 
function, which encapsulates the transition from ha- 
los which are not massive enough to host a galaxy 
which meets the observational selection, to the mass 
for which all central galaxies are included. The mean 
number of satellite galaxies per halo is described by a 
power-law, which reache s unity at a higher hal o mass 
than the central HOD (|Coorav fc ShethI 120021 '). The 
canonical form used to model the HOD of optically 
selected galaxy samples is shown by the fit in the left 
panel of Fig. [8l 

A limitation of the HOD approach is that it is de- 
scriptive rather than predictive. Given an observa- 
tional measurement of the clustering of galaxies, the 
parameters of the HOD can be constrained to repro- 
duce this clustering, returning an interpretation of the 
measurement in terms of the number of galaxies per 
halo. The basic HOD machinery cannot make a pre- 
diction for a new clustering measurement, with for 
example, a different galaxy selection or at a different 
redshift. However, refinements to the HOD model to 
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includ e galaxy luminosity an d colour have been de- 
vised (jSkibba fc ShethI 120091 '). As we wiU see later, 
the canonical form of the HOD outline above does not 
apply to all galaxy selections and there is no way to 
anticipate this without trying to implement a physi- 
cal model of the galaxy population. Lastly, the ba- 
sic assumption behind HOD modelling, that the clus- 
tering of dark matter haloes depends solely on halo 
mass has recentl y been demonstrated to be inaccurate 
(,Gao et al.„2C)05l:ICroton et al.ll2006l : rGao fc Whitel2007l : 
lAneulo et ahlbOOgT ). 

Sub-halo abundance matching (SHAM) is an even 
simpler approach to realising a galaxy distribution in 
an N-body simulation. The fundamental assumption 
behind SHAM is that there is a monotonic relation 
between a galaxy property, e.g. stellar mass, and the 
mass of the subhalo which hosts the galaxy. This rela- 
tion is also assumed to have zero scatter. The subhalos 
from the simulation are then ranked in mass, break- 
ing each halo into its component subhalos. A vol- 
ume limited sample of galaxies, e.g. generated from 
a measurement of the galaxy luminosity function is 
then also ranked by the galaxy property (in this ex- 
ample, luminosity) and the two lists are paired off, 
with the most luminous galaxy being matched up with 
the mos t massive subhalo until t he end of the list is 
reached (|Vale fc Ost riker 2004; Conrov et ah 2006). In 
the simulation, the mass estimated for subhalos can be 
affected by stripping so the mass of the subhalo at in- 
fall is used in the SHAM procedure. 

SHAM seems to provide surprisingly good dcscrij> 

tions of observational samples (Conrov ct al. 2006; Moster et al.l 
|2 0l d). This is all the more remarkable when one con- 
siders that no distinction is made regarding where the 
subhalo came from, that is, regardless of whether it 
was part of a cluster-mass halo or an isolated halo, 
there is assumed to be a conne ction to a galaxy prop- 
erty l|Watson fc Conrovl |2013| ) . One might imagine 
that environmental factors would change the nature 
of the connection between subhalo mass and galaxy 
property for a satellite galaxy in a cluster. In SHAM, 
the subhalo mass is frozen at infall into a larger struc- 
ture. Subsequently, the satellite galaxy could continue 
to form stars using up any available reservoir of cold 
gas, which would appear to change the subhalo mass - 
galaxy property relation. 

The SHAM approach has been extended to cope 
with the scatter in the galax y property halo mass rela- 
tion (Mo ster et al]r2 010: Rod nguez-Puebla et al.l2012t ). 
The assump tion which underpins SHAM has been eval- 
uated bv lSimha et al.l ([2012 ') using the output of a gas 
dynamic simulation. These authors found that the 
simulation produced relations between selected galaxy 
properties and subhalo mass which were monotonic, 
but with scatter. The scatter led to the clustering in 
a catalogue constructed by applying the SHAM hy- 
pothesis to differ somewhat from that in the original 
simulation output. 

The connection between empirical models of galaxy 
clustering based on the smoothed distribution of mat- 
ter an d those which start fro m haloes has recently been 
made l|Cacciato et al.ll2012l '). In the next section we 
discuss a more physical approach which does not rely 
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upon existing clustering data being available. 

3 Physical modelling of galaxy 
formation 

By itself, the cold dark matter model says nothing 
directly about galaxy formation. Inferences can be 
drawn about the sequence of galaxy formation, based 
on how structures grow in the dark matter. However, 
without an attempt at a physically motivated calcu- 
lation of the fate of baryons in a cold dark matter 
universe, there is little hope of learning much about 
galaxy formation or of understanding the implications 
of observations of high redshift galaxies for the cold 
dark matter cosmology (for reviews see Baugh 2006 
an d Benson 2010) 

IWhite fc Reeg { 19781 ) argued that galaxy forma- 
tion is a two-phase process, with the bulk of the mass 
undergoing a dissipationless collapse which is responsi- 
ble for building the gravitational potential wells or ha- 
los in which galaxies form. The baryonic component of 
the universe is able to dissipate energy, and therefore to 
collapse down to smaller scales, forming denser units, 
which retain their identity within the cluster. This 
model was able to explain the appearance of clusters 
of galaxies. However, without an additional process 
to reduce the efficiency of galaxy formation in shallow 
gravitational potential wells, the predicted luminosity 
function is much steeper than is observed at the faint 
end. 

This pioneering work, along with a clutch of pa- 
pers published around the same time looking at the 
radiative cooling of gas within gravitational potential 
wells, laid the groundwork for modern galaxy forma- 
tion theories. The break in the galaxy luminosity func- 
tion can be understood by comparing the time taken 
for gas to cool to the age of the universe. The time 
taken for all of the gas within a halo to cool radia- 
tively increases with halo mass. This is because cooling 
is a two-body process (coUisionally excited radiative 
transitions or bremsstrahlung) which depends on the 
square of the gas density. In hierarchical models, more 
massive haloes tend to form later when the density of 
the universe is lower. It is possible for the cooling time 
of the gas to exceed the Hubble time, thus limiting the 
supply of cold gas to form a galaxy (see the review of 
Fred Hoyle's contributions to galaxy formation theory 
by Efstathiou 2003). 

The first papers to incorporate these ideas fully 
into the cold dark matter cosmology, introducing the 
semi analytical methodology, were published in 199 1 
(|White fc Frenklll99ll : IColdTl99ll : iLacev fc Silflll99ll ) . 
This approach tries to follow a wide range of the pro- 
cesses which are thought to be important in determin- 
ing the fate of the baryons. This is a daunting task. 
At the time, theories of star formation were rudimen- 
tary at best. There has been much progress in this 
area since 1991, but we are still a long way from hav- 
ing a reliable description of the process which under- 
pins galaxy formation. The regulation of star forma- 
tion efficiency comes from the stars themselves. Stars 
above « 5 — 8 times the mass of the Sun end their 



life in a Type II supernova, which injects substantial 
amounts of energy and momentum into the interstellar 
medium. This alters the state of the gas in the inter- 
stellar medium (ISM), perhaps leading to the ejection 
of gas from the galactic disk or even the dark mat- 
ter halo. This process is known as supernova feedback 
and is critical to the success of any model of galaxy 
formation. 

The absence of a precise description of a key pro- 
cess, such as star formation and supernova feedback, 
may lead one to consider giving up any hope of ever 
understanding galaxy formation. Instead, in semi ana- 
lytical modelling an attempt is made to write down the 
differential equation which gives the current best bet 
model of how the system behaves. As our understand- 
ing develops, or when new observations clarify how a 
process works, then the model can be improved. The 
differential equation may contain a free parameter. Of- 
ten there is little guidance as to the appropriate range 
of values to take for the parameter. In such instances, 
the only approach is to be pragmatic and see what 
the model predicts for different parameter values. By 
comparing the model predictions to observations, the 
value of the parameter is set as the one which gives the 
most faithful reproduction of the data. This procedure 
is exactly what physicists undergo when attempting to 
describe complex phenomena: start off with a simple 
model, which can be adjusted or refined to improve the 
description of the observations. I will give an example 
of this principle in action in the next section. 

The semi analytical framework allows us to model 
a range of processes together, within the cosmological 
setting of the formation of structure in the dark mat- 
ter. The ability to follow the interplay between pro- 
cesses is essential in studying galaxy formation. The 
models solve the set of differential equations which gov- 
ern the flow of mass and metals between different reser- 
voirs of baryons: hot gas, cold gas and stars (Fig. [SJ. 
The output of the models is the full star formation 
and chemical enrichment histories for a wide range of 
galaxies, including mergers between galaxies. 

Semi analytical modelling has some features which 
might be perceived as limitations or drawbacks. One 
example is the generality of the assumptions which are 
needed to be able to calculate the fate of the bary- 
onic component. Another is the "deterministic" way in 
which processes such as supernova feedback are mod- 
elled. In the semi analytical model, the mass loading 
of the supernova driven wind is specifled by choosing 
model parameters, and precisely this amount of gas is 
ejected from the ISM. In a gas dynamics simulation in 
which the wind is fully coupled to the hydrodynam- 
ics equations (note this is not generally the case, with 
a semi-analytical model of feedback inserted into the 
simulation to describe feedback), the same number of 
supernovae could result in a very different mass of gas 
being ejected. The mass loading could be intricately 
linked to the resolution of the simulation. 

Nevertheless, despite the progress made over the 
past twenty years, there is still widespread mistrust 
of semi analytical modelling. This has led to a bur- 
geoning reductionist movement in galaxy formation in 
which simplified models have been devised with the 
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Figure 5: The flow of mass and metals between reservoirs of hot gas, cold gas and stars. Semi- analytical 
models of galaxy formation solve the differential equations which describe the transfer of materials between 
these reservoirs. Reproduced from iCole et al.i ((2000i ). 



aim of elucidating how galaxies form. Ex amples in- 
clude the "bathtub" and "reservoir" models ()Bouche et al.l 
I2OIOI : iDave et al.l l20l3 ). These calculations are in- 
spired by models of supply and demand from eco- 
nomics, and track the inflow (sources) of gas into halos 
and the "sinks" of cold gas in star formation and su- 
pernova feedback. In their simplest form, the models 
follow one galaxy per halo, and invoke ad-hoc efficiency 
factors to specify the inflow of g function of 

halo mass, without any attempt to calculate the rate 
at which gas can cool or to explain the form of the 
efficiency factor. Galaxy mergers are ignored. This 
class of calculation effectively takes one of the equa- 
tions which has been considered within semi analyti- 
cal models for more than two decades and solves it in 
isolation. 

The desire for a better grasp of how galaxy prop- 
erties are shaped by different processes is understand- 
able, but it is not clear that it can be usefully gained 
from such stripped-down approaches. The perceived 
"complexity" of semi analytical modelling is actually 
the great strength of the technique. The ability to 
model the interplay between processes is the key to 
building a realistic model of galaxy formation. By tak- 
ing a more complete view of galaxy formation rather 
than a selective one, the consequences of the calcula- 
tion - the predictions of the model - are more far reach- 
ing and therefore more tightly constrained by observa- 
tions. If the model seems complex, then this is simply 
a reflection of the nature of the underlying processes, 
such as star formation and heating by supernovae. 



Semi-analytical modelling of galaxy formation is 
complementary to the approach of using a gas dynam- 
ics simulation, with the two techniques having many 
aspects in common. In general, gas dynamics simu- 
lations rely on fewer assumptions to follow some of 
the processes in galaxy formation. For example, the 
treatment of gas cooling in semi-analytical models as- 
sumes spherical symmetry, whereas this is not nec- 
essary in a hydrodynamics simulation. Nevertheless, 
in carefully controlled comparisons, the modelling of 
gas cooling in semi-analytical models can produce the 
same results that are obtained in the hydro-simulation 
(lYoshida et al.ll2002l : iHellv et al.ll2003l : iDe Lucia etlo] 
|2.Qldl). In other areas, the two methods are more sim- 
ilar than many people realize. A good illustration is 
star formation, which is firmly "sub grid" in simula- 
tions which aim to follow more than one galaxy. The 
treatments of star formation in a gas dynamics simu- 
lation and in a semi-analytical model are very similar. 
Further discussion of how star formation is treated in 
semi-analytical models is given in the next section. A 
key limitation on the use of gas dynamics simulations 
to model galaxy clustering is their computational ex- 
pense and the requirem ent for "sufficient" reso lution in 
mass and length scales (|Governato et al1l2007l) . These 
considerations have tended to force gas simulators to 
use relatively small simulation boxes, typically mea- 
sured in tens of megaparsecs. This in turns limits the 
predictions for the clustering to pair separations of a 
few magaparsecs. An alternative to trying to predict 
the galaxy correlation function is to focus instead on 
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hhow haloes are populated with galaxies. If enough 
different environments can be sampled, e.g. by resimu- 
lating patches from a larger volu me at high resolution 
and with eas (|Crain et al.l 120091 ') . then such a simu- 
lation could be used to predict the halo occupation 
distribution. One advantage of gas simulations over 
semi-analytics is that they can follow the redistribu- 
tion of matter due to outflows of baryons. Calculations 
using the Over Whelmingly Large Simulations have 
shown that the physics of galaxy formation, particu- 
larly AGN feedback, has an impact on the distribution 
of matter which has implication s for the interpretation 
of weak lensing measur ements (|Semboloni et al]|201ll : 
Ivan Daalen et al.ll201lh . 

Nevertheless to address clustering on scales of tens 
to hundreds of megaparsecs, the only viable technique 
is semi-analytics used in combination with large vol- 
ume, high resolution N-body simulations of the clus- 
tering of dark matter, which we focus on in the later 
sections of this review. 

4 Illustration: the star forma- 
tion rate in galaxies 

An illustration of how semi analytical models work can 
be obtained by considering recent progress in how star 
formation is modelled within a galaxy. 

The bulk of semi analytical models attempt to pre- 
dict the global star formation rate within a galaxy. 
The early modelling of the star formation rate was 
essentially based on dynamical arguments, with loose 
motivation co ming from a comp arison to the Kennicutt- 
Schmidt law (|Bell et al.l l200a). The star formation 
rate, ^, is often parametrized as 



where Mcom is the total mass of cold gas in the galaxy 
and e is an efficiency factor which controls the fraction 
of cold gas which is turned into stars in the timescale r. 
The timescale for star formation is generally assumed 
to scale with the dynamical time within the galaxy: 

T = ^dyn/(t^disk)- 

In some models, /(iidisk) ~ 1; in the lCole et al.l (|200Gl ) 
model, an explicit scaling of the star formation timescale 
with the circular velocity of the disk was implemented, 
to allow the model to produce a better match to the ob- 
served gas fraction luminosity relation for spiral galax- 
ies: /(udisk) ~ (udisk/200kms~^) *. Hence in the 
most general case, two parameters are required to set 
the star formation rate: e and a*. These parameters 
are set by chosing values, running the model and then 
comparing the model predictions to observables. The 
key observables for constraining the values of these star 
formation parameters are the gas fraction - luminosity 
relation, the galaxy luminosity function and the colour 
magnitude relation. 

High resolution imaging of galaxies at different wave- 
lengths has revealed that star formation activity cor- 
relates better with the molecular hydrogen content of 



galaxie s than with the overall cold gas mass. iLagos et al.l 
( 2011b') investigated more general star formation mod- 
els in the GALFORM semi analytical model, implement- 
ing different empirical and theoretically motivated star 
formation laws (see also Cook et al. 2010 and Fu et al. 
2010). The most successful of t hese was the empiri- 
cal star formation law proposed bv lBlitz fc Rosolowskvl 
(2006), who suggested that the observational data could 
be explained if the ratio of molecular to atomic hydro- 
gen is set by the pressure in the mid-plane of galac- 
tic disks; gas disks with higher pressure have a higher 
fraction of H2. 

This work illustrates the modularity of semi an- 
alytical modelling and how it provides a framework 
in which new and improved descriptions of various 
processes can be readily implemented. The Blitz & 
Rosolowsky star formation law involves two observa- 
tionally determined "parameters". Whereas in the 
original parameterization of the star formation rate 
there was little guidance about the range of param- 
eter values which should be considered, there is now 
a much smaller volume of parameter space to search 
(at least once the Blitz & Rosolowsky law has been 
adopted). Furthermore, as the modelling of the star 
formation becomes more sophisticated, the predictions 
that can be made by the model expand. Rather than 
simply outputting the cold gas mass of galaxies, the 
atomic and molecular hydrogen contents are now pre- 
dicted, meaning that the model should also be able 
to reproduce the mass functions of HI and H2, their 
evolution and their relation to other galaxy properties 
(jLaeos et al.ll2011a^ . By combinir ig GALFORM with the 
photon dominated region model of Bell et "all l|2006t) . it 
is also possible to predict the different carbon monox- 
ide transition s, and to make con tact with observations 
from ALMA l|Lagos et al.ll2012l '). 

Hence by adopting the improved star formation 
model, the parameter space open to the model has 
shrunk in volume and the constraints on the model 
have increased through the capability to make new 
predictions which must match the available observa- 
tions. 

5 Predictions for galaxy clus- 
tering 

The combination of a semi-analytical model of galaxy 
formation with a cosmological N-body simulation ex- 
tends the capability of the models to make predictions 
for the spatial distributio n of galaxies (|Kauffmann et akl 
ll999l : lBenson et al.ll2000l 1. The models follow the physics 
of the baryonic component of the universe to predict 
how many galaxies populate dark matter haloes as a 
function of their mass and formation history, and tells 
us the properties of these galaxies. The semi-analytical 
model therefore predicts the mean number of galaxies 
per halo and it was the description of the model output 
in these terms which helped to stimulate the develop- 
ment of HOD modelling. 

The form of galaxy bias can be understood by first 
looking at the clustering of dark matter haloes. The 
canonical model is that the clustering of halos can be 
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Figure 6: The scale dependent bias of haloes of different mass, as measured from a very large volume 
N-body simulation. Each panel corresponds to a different redshift as labelled. The halo mass range and 
the measured asymptotic bias are given by the legend. If the asymptotic bias described the halo power 
spectrum, the ratio of the halo power spectrum divided by the linear power spectrum multiplied by the 
square of this bias would be unity. The clustering of haloes measured from the simulation deviates strongly 
from a ratio of unity, which indicates that the halo bias is scale dependent. Furthermore, the shape of 
these curves is different from that corresponding to the nonlinear mat ter power spe c trum divided by the 
linear theory spectrum (shown by the dashed line). Reproduced from lAngulo et all ( 20081) . 




Figure 7: The predicted scale-dependent bias in the galaxy distribution. As in the previous figure, the 
power spectrum measured for different galaxy selections is divided by the linear theory power spectrum 
multiplied by the square of the asymptotic bias. Different colours correspond to different selections: 
red and orange show the predicted clustering for flux limited samples, the blue curves show the power 
spectrum for red galaxies and the green curves show galaxies with strong emission lines. Reproduced from 
lAngulo et afl (|2008tV 



www.pu blish . csiro. au /journals/ pasa 



9 




Figure 8: The form of the hal o occupation distrib utio n predicted by two differe nt semi- analytic galaxy 
formation models, the models of lBower et al ] (l200fih and lbe Lucia fc BlaizotI (|2007D . The HOD for galaxies 
selected according to a different intrinsic property is shown in each panel: left - stellar mass, middle - cold 
gas mass, right - star formation rate. In all cases, the samples haye been ranked in terms of the intrinsic 
property, and the same abundance of objects is considered. The form of the HOD predicted for the cases 
of cold gas and star formation rate selected samples has a different form from that for stellar mass selected 
samples, with a peaked HOD for central galaxies. The dashed curyes show how well parametric equations 
for the HOD can reproduce the forms predicted in the models. For stellar mass samples, a five-parameter 
fit gives a good match to the mode l results. For col d gas o r star formation rate samples, a nine-parameter 
HOD is needed. Reproduced from lContreras et al. 
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described by multiplying the matter power spectrum 
by the square of an asymptotic bias factor. Formally, 
the bias factor should be applie d to the linear powe r 
spectrum of matter fluctuations. lAngulo et al.l (|2008l ) 
investigated this hypothesis with a moderate resolu- 
tion N-body simulation of a very large cosmological 
volume, measuring 1340/!.~^Mpc on a side. Fig. [6] 
shows the ratio of the power spectrum measured for 
different samples of dark matter haloes divided by a 
scaled linear theory power spectrum. The scaling is the 
square of the asymptotic bias, which is measured from 
the simulation on very large scales (small A;). This ratio 
deviates strongly from unity at quite large scales, typi- 
cal of those used to fit BAO. This means that a simple 
bias squared times the linear theory spectrum is not 
a good way to describe halo clustering. If the linear 
power spectrum is replaced by the nonlinear matter 
power spectrum in the simulation, there is some im- 
provement, but there are still substantial deviations, as 
shown by the discrepancy between the coloured curves 
and the dashed black line in Fig. [B] This disagree- 
ment is particularly strong at high redshift, where the 
resolved haloes correspond to higher peaks in the den- 
sity field than they do at lower redshifts. 

The next step in the calculation is to combine the 
large volume N-body simulation with a semi-analytical 
model of galaxy formation. This is the only way to 
make predictions for galaxy clustering on scales of tens 
of megaparsecs and above. Current simulations which 
follow the hydrodynamics of the gas are restricted to 
volumes which are several thousand times smaller, and 
can only reliably predicted galaxy clustering out to 
pair separations of a few megaparsecs. Fig. [7] reveals 
that both the asymptotic bias and the form of the scale 
depende nce of the bias depe nd upon how galaxies are 
selected (|Angulo et al.|[200^ '). This in turns has impli- 
cations for the apparent positions of the BAO when 
observed using different galaxy tracers. 

Finally one might ask, given the uncertainty in the 
modelling of the processes behind galaxy formation, 
how far can we trust the predictions of semi-analytical 
models for galaxy clu stering? The Millennium N-body 
simulation of Springel et al.l (|2005l ') provides an excel- 
lent test-bed on which differe nt semi-analytica l mod - 
els can be run and compared. IContreras et al.l (|2013l ) 
compared the clust ering predictions of the Durham 
and Munich models (Bower et al.'2006'; De Lucia e t al. 
2006; Bcrtonc ct al. 2007; Font ct al. 2008; Guo e t al. 
201 1). These groups have developed independent mod- 
els which follow the same processes but with differ- 
ent implementations. These differences even extend 
to the first step in the galaxy formation code of ex- 
tract merger histories for dark matter halos from the 
simulation. A summary of the comparison is given in 
Fig. [8] The different models give remarkably similar 
predictions for the IfOD (an output of the models) for 
galaxy samples selected by stellar mass. The results 
are qualitatively similar for samples selected by the 
cold gas mass or star formation rate of the galaxies, 
but differ in detail. These differences can be traced 
to the way in which star formation is modelled by the 
different groups. 



6 Conclusions 

I have discussed empirical and physical methods for 
connecting dark matter haloes to galaxies. Empirical 
methods include: 1) Applying a weighting scheme to 
the smoothed dark matter density field. 2) Applying 
a weighting of dark haloes through the HOD which 
specifies the mean number of galaxy pairs as a function 
of halo mass. 3) SHAM, in which galaxies and subhalos 
are first ranked and then matched up. The physical 
approach is to carry out a calculation of the fate of 
baryons in a cold dark matter universe to predict which 
galaxies are in which haloes. Currently, this is only 
possible in cosmologically representative volumes by 
using a semi-analytical model of galaxy formation. I 
briefiy reviewed how these models work and gave an 
illustration of the power of this approach by discussing 
recent work on improved models of the star formation 
rate in galaxies. 

Much progress has been made in understanding the 
connection between haloes and galaxies and hence of 
galaxy bias. One clear conclusion so far is that galaxy 
bias is scale dependent and depends sensitively on the 
selection applied to construct the sample. This needs 
to be taken into account when analysing large-scale 
structure as a cosmological probe so that all of the data 
can be utilized. A comparison of the predictions from 
different models which aim to follow the same pro- 
cesses iii_gatocj;_Jomi|itio^ some encouraging re- 
sults l|Contreras et al.|[2013t ) . The predictions for sam- 
ples selected by stellar mass seem robust. However, 
there is more discrepancy between the predictions for 
other galaxy selections which are closer to what will 
be used in future galaxy surveys. This suggests that 
further theoretical work is needed if we are to max- 
imize the potential of future surveys to tells us the 
values of the basic cosmological parameters and about 
the physics of galaxy formation. 
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